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Abstract 

Background: An increasing number of long noncoding RNAs (IncRNAs) have been identified recently. Different 
from all the others that function in as to regulate local gene expression, the newly identified HOTAIR is located 
between HoxCl 1 and HoxC12 in the human genome and regulates HoxD expression in multiple tissues. Like the 
well-characterised IncRNA Xist, HOTAIR binds to polycomb proteins to methylate histones at multiple HoxD loci, 
but unlike Xist, many details of its structure and function, as well as the trans regulation, remain unclear. Moreover, 
HOTAIR is involved in the aberrant regulation of gene expression in cancer. 

Results: To identify conserved domains in HOTAIR and study the phylogenetic distribution of this IncRNA, we 
searched the genomes of 10 mammalian and 3 non-mammalian vertebrates for matches to its 6 exons and the 
two conserved domains within the 1800 bp exon6 using Infernal. There was just one high-scoring hit for each 
mammal, but many low-scoring hits were found in both mammals and non-mammalian vertebrates. These hits 
and their flanking genes in four placental mammals and platypus were examined to determine whether HOTAIR 
contained elements shared by other IncRNAs. Several of the hits were within unknown transcripts or ncRNAs, many 
were within introns of or antisense to, protein-coding genes, and conservation of the flanking genes was observed 
only between human and chimpanzee. Phylogenetic analysis revealed discrete evolutionary dynamics for 
orthologous sequences of HOTAIR exons. Exonl at the 5' end and a domain in exon6 near the 3' end, which 
contain domains that bind to multiple proteins, have evolved faster in primates than in other mammals. Structures 
were predicted for exonl, two domains of exon6 and the full HOTAIR sequence. The sequence and structure of 
two fragments, in exonl and the domain B of exon6 respectively, were identified to robustly occur in predicted 
structures of exonl, domain B of exon6 and the full HOTAIR in mammals. 

Conclusions: HOTAIR exists in mammals, has poorly conserved sequences and considerably conserved structures, 
and has evolved faster than nearby HoxC genes. Exons of HOTAIR show distinct evolutionary features, and a 239 
bp domain in the 1804 bp exon6 is especially conserved. These features, together with the absence of some exons 
and sequences in mouse, rat and kangaroo, suggest ab initio generation of HOTAIR in marsupials. Structure 
prediction identifies two fragments in the 5' end exonl and the 3' end domain B of exon6, with sequence and 
structure invariably occurring in various predicted structures of exonl, the domain B of exon6 and the full HOTAIR. 



Background 

Consistent with pervasive transcription of the genome 
[1,2], many noncoding RNAs (ncRNAs) have recently 
been discovered. In addition to abundant microRNAs 
(reviewed recently in [3,4]), an increasing number of 
long noncoding RNAs (IncRNAs) have been identified, 
and their crucial functions have been experimentally 
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confirmed. One key aspect of their functions is tissue- 
specific genome modification [5,6]. After a cell is fully 
differentiated, many genes are specifically silenced by 
polycomb proteins and IncRNA-mediated histone methy- 
lation rather than by a large army of negative transcrip- 
tional factors [7]. Another important aspect is genomic 
imprinting [8]. One typical example is the Xist-mediated 
inactivation of the whole X chromosome. Because diverse 
tissue-specific histone methylation and gene silencing are 
performed by only a handful of polycomb proteins [9], 
the great enigma of genome modification is how a few 
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polycomb proteins dynamically and accurately target spe- 
cific DNA sequences. The discovery of a large amount of 
IncRNAs should provide key information. 

The most studied case of genome modification is X 
inactivation, the silencing of the majority of genes on one 
of the X chromosomes in somatic cells to balance gene 
copy number during mammalian embryogenesis [10,11]. 
X inactivation is mediated by the IncRNA Xist [12,13], and 
its details have recently been elucidated [14,15]. Regarding 
gene silencing and dosage compensation apart from the X 
chromosome, several IncRNAs, including HOTAIR, play 
essential roles. HOTAIR is co-expressed with the HoxC 
genes, interacts with polycomb proteins and functions 
in trans to repress HoxD expression [16,17]. In addition to 
creating and maintaining spatiotemporally patterned 
HoxD expression in multiple tissues during embryogen- 
esis, HOTAIR is also involved in aberrant gene expression 
in cancers [18]. The recently discovered functions of Xist, 
HOTAIR and other IncRNAs suggest the hypothesis 
that numerous IncRNAs should exist to bridge the 
limited number of polycomb proteins and the diverse 
tissue-specific genome modification. Moreover, many of 
them should be evolutionarily conserved. 

Tiling arrays are widely used to discover new transcripts, 
especially new ncRNAs [19]. Although this method is 
convenient and powerful, it can only uncover noncoding 
transcripts expressed at particular times in particular 
cells. If functional domains of an IncRNA, such as Xist or 
HOTAIR, interact with polycomb proteins, they are likely 
to be conserved in animals and possibly shared by other 
IncRNAs. This pattern of conservation allows computa- 
tional genome analysis to be used to identify new IncRNAs 
and their functional domains, as has been successfully per- 
formed for microRNAs [20]. The origin and evolution of 
these IncRNAs are also of great importance and interest, 
but they have so far hardly been addressed, except for Xist 
[21,22]. In this study, we computationally investigated 
HOTAIR, the first IncRNA shown to function in trans. 
Specifically, we investigated the following questions: (1) 
whether HOTAIR exists in all mammals or vertebrates, (2) 
whether it has functional domains shared by other known 
or potential ncRNAs and whether they are evolutionarily 
conserved, (3) the evolutionary features of HOTAIR, and 
(4) the possible structures of its functional domains. We 
addressed the first question using Infernal, a structure- 
based RNA homology search program [23], to search the 
genomes of 13 vertebrates with exons of HOTAIR. We 
addressed the second question by thoroughly evaluating 
all of the hits in five animals. We addressed the third ques- 
tion using Paml and EvoNC to analyze sequences ortholo- 
gous to HOTAIR exons [24,25]. Finally, we addressed the 
fourth question using PMmulti and Mfold to predict the 
structures of HOTAIR exons and the full HOTAIR 
sequences [26,27]. Our results indicated that orthologues 



of HOTAIR existed only in mammals and that HOTAIR 
has evolved faster than the neighbouring HoxC genes. 
Moreover, HOTAIR exons showed discrete evolutionary 
dynamics, with some having evolved significantly faster in 
primates. Hits of exons as a whole, with high and low 
scores, were poorly conserved in animals, except between 
closely related species. Many hits fell within introns of, or 
were antisense to, protein coding genes. A comparison of 
all the predicted 2 dimensional (2D) structures of exonl 
and the two conserved domains of exon6 revealed two 
invariable fragments in these structures. These results 
uncovered multiple facets of HOTAIR and the implica- 
tions of our results within the wide range of IncRNA evo- 
lution and function are discussed. 

Methods 

Data 

The sequence of human HOTAIR was obtained from the 
National Center for Biotechnology Information (NCBI) 
database (accession number NR_003716.2). The unmasked 
genome data (Ensembl database version 57) of human 
(GRCh37.p2, Feb. 2009), chimpanzee (CHIMP2.1, Mar. 
2006), rhesus monkey (MMUL 1.0, Feb. 2006), gorilla 
(gorGor3, Dec. 2009), cow (Btau_4.0, Oct. 2007), horse 
(Equ Cab2, Sep. 2007), dolphin (turTrul, Jul. 2008), dog 
(CanFam2.0, May 2006), mouse (NCBI m37, Apr. 2007), 
rat (RGSC 3.4, Dec. 2004), platypus (Ornithorhynchus_a- 
natinus-5.0, Dec 2005), chicken (WASHUC2, May 2006), 
and zebrafish (Zv9, Apr. 2010) were downloaded from 
Ensembl. The sequences corresponding to the rat ortholo- 
gue of HOTAIR exon6 (consisting of two domains of 
HOTAIR exon6 that are conserved in mammals) and the 
sequences corresponding to the short exon of human 
HoxC12 aligned by Multiz against 22 mammals (human, 
chimpanzee, rhesus monkey, gorilla, hedgehog, dog, rat, 
mouse, dolphin, elephant, orangutan, baboon, guinea pig, 
rabbit, cow, horse, marmoset, kangaroo, armadillo, hyrax, 
lemur, and platypus) were obtained from the UCSC Gen- 
ome Browser database [28] . 

Obtaining sequences orthologous to the IHOTAIR exons 

Each of the 6 exons of human HOTAIR was used as a 
query to search the genomes of rhesus monkey (rheMac2, 
Jan. 2005) and dog (Broad/canFam2, May 2005) in 
Ensembl using BEAT [29]. The sequences orthologous 
(best hit) to each exon, except exon6, in human, rhesus 
monkey and dog (exon2 and exon5 did not have good hits 
in dog) were aligned using PMmulti (vl.6, [30]); the 
sequences orthologous to exon6 were aligned using 
LocARNA (vl.5.4, [31]). With the 6 alignment results, 6 
queries (queryl to query6) were built using the cmbuild 
and cmcalibrate functions of Infernal (vl.0.1, [23]) and 
then used to search the whole genomes of 13 vertebrates 
using the cmsearch function of Infernal. Two short 
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domains of exon6, the ~235 bp domain A and the ~239 
bp domain B, were identified in orthologues of exon6 in 
all 10 mammals. For the orthologous sequences of the two 
domains in human, rhesus monkey and dog, two addi- 
tional queries (query6a and query6b) were built using the 
cmbuild and cmcalibrate functions of Infernal. They were 
then used to search the whole genomes of the 13 verte- 
brates using the cmsearch function of Infernal. 

Sequence alignment and structure prediction 

Sequences of the 10 mammals that were orthologous to 
HOTAIR exon6 were aligned using LocARNA, and 
sequences orthologous to all of the other HOTAIR exons 
from the 10 mammals were aligned using PMmulti for 
phylogenetic analysis. Structures were predicted for the 
orthologues of exonl, exon6 domain A and exon6 
domain B using PMmulti and Mfold (http://mfold.rna. 
albany.edu/?q=mfold, [26,27]), and the predicted struc- 
tures were displayed using either Mfold or Pseudo Viewer 
(v2.5, [32]). In all cases, default parameters were used 
unless otherwise indicated. 

Phylogenetic analysis 

Using orthologous sequences of HOTAIR exon6 and the 
concatenated homogeneous sequences of exonl, exon3, 
exon4 and exonS in the 10 mammals, two phylogenetic 
trees were built using the dnadist and kitsch functions of 
Phylip (v3.69, [33]). Phylogenetic analysis of the two trees 
was performed using the baseml function of Paml (v4.4, 
[24]). Fixed parameters included model = 4 (the HKY85 
nucleotide substitution model); fix_kappa = 0 and kappa 
= 2; fix_alpha = 0 and alpha = 0.5; ncatG = 5,fix_rho = 1 
and rho = 0; and cleandata = 0. The parameters kappa 
(k), alpha (a), local clock and rates of substitution were 
estimated under different conditions [34-37]. The evolu- 
tion of the two conserved domains of exon6 and the short 
exon of the neighbouring HoxC12 gene was analyzed in 
22 mammals using EvoNC [25]. 

Results 

The sequences of orthologous HOTAIR exons are poorly 
conserved 

IncRNAs that function in cis, including Xist, AIR and 
Kcnqlotl [38-41], should evolve closely with their nearby 
target genes. HOTAIR is the first IncRNA that has been 
found to function in trans to regulate remote gene expres- 
sion; it is co-expressed with the HoxC genes on chromo- 
some 12 and down-regulates HoxD genes on chromosome 
2 in particular tissues in human [16]. Because this regula- 
tion occurs between distant genomic loci, whether it exists 
in other mammals and non-mammalian vertebrates is of 
great interest. We first searched several mammalian gen- 
omes in the UCSC and Ensembl databases for matches to 
HOTAIR exons using Blastn and BLAT [29]. Close 



matches were found in primates but not in other mam- 
mals. For example, although the whole sequence of 
HOTAIR showed apparent conservation among mamma- 
lian orthologues (Figure lA), individually, the five short 
exons (exonl to exon5) returned few hits from mamma- 
lian genomes. Using BLAT, only exons 1, 3 and 4 pro- 
duced hits in dog and only exon4 produced hits in cow 
between HoxCll and HoxC12, and all of these hits had 
poor scores. This finding suggests that if HOTAIR has 
orthologues in mammals and other vertebrates, they may 
show low sequence conservation. This lack of conservation 
is not a surprise because compensatory mutations occur 
widely in ncRNAs, and many ncRNAs are conserved in 
structure but not in sequence [42,43]. 

HOTAIR exists only in mammals 

Because ncRNAs are characterised by divergent sequences 
and conserved structures, to further address the question 
of whether HOTAIR exists in mammals and other non- 
mammalian vertebrates, we used Infernal to search whole 
genomes for matches to HOTAIR exons. Infernal is a 
local RNA alignment and search tool based on structure 
conservation [23]. To make the covariance model necessa- 
rily representative, we chose rhesus monkey, a primate 
that is more distantly related to human than chimpanzee, 
and dog, a mammal that produced several hits in the 
BLAT search. First, we identified sequences orthologous 
to HOTAIR exons in rhesus monkey and dog; these 
sequences were BLAT search hits with high scores and 
successive locations between HoxCll and HoxC12. 
Second, we used the sequences of six human HOTAIR 
exons and their orthologues in rhesus monkey and dog to 
build six queries using the cmbuild and cmcalibrate func- 
tions of Infernal. Using these queries, we searched the gen- 
omes of 10 placental mammals (human, chimpanzee, 
rhesus monkey, gorilla, cow, horse, dog, dolphin, mouse 
and rat), the ancestral mammal platypus, and 2 other ver- 
tebrates (chicken and zebrafish). Orthologues of the 
HOTAIR exons (hits located between HoxCll and 
HoxC12 with high scores) were obtained in all of the pla- 
cental mammals but not in platypus or the other verte- 
brates (Figure IB and Table SI in Additional file 1). 
Notably, each query produced just one high-scoring hit in 
the mammalian genomes. These hits were located between 
HoxClO/HoxCll and HoxC12/HoxC13 (HoxCll or 
HoxC12 is absent in some mammals. Figure IC), and all 
of the other hits had low scores. Query2 did not produce 
any high-scoring hits in dog, mouse or rat. Moreover, 
query6 produced good matches in primates but poor 
matches in other mammals, especially in mouse and rat 
(Figure IB, Table SI in Additional file 1, and Additional 
file 2). These results suggest that HOTAIR exists only in 
mammals and that, after some evolutionary process, it 
became highly conserved in primates. 



He ef al. BMC Evolutionary Biology 201 1, 11:102 
http://www.biomedcentral.eom/1 471 -2 1 48/1 1 /1 02 



Page 4 of 14 



Seal* 

Chrl2: 



10kb|- 



H0XC12 



I M35S0O0l 543600001 5436SO00l S4370O0ol 

3SC Genes Based on RefSeq. UniProl. GenBenk. CCDS and Comparative Genofri 
HOTAIR^^«^-'^-'-ft«-t-t<^^) HOXCtl^ 

Placental Mammal Basewise Consen/ation by PtiyloP 
Mammal Cons .h,,,_ti^_,i_^,t.^_.__,__,j___,j^^ 

Muitiz Alignments of 46 Vefletxates 




Human 
Chimp 
Gorilla 
Rhesus 
Cow 
Horse 
Dolphin 
Dog 
Mouse 
Rat 
Kangaroo 



H0XC13 


HOXC12 


HOTAIR 


HoxC1 1 


HokCIO 


* 


— * — 




♦ 


— ♦ 


< 


— * 




♦ 

X 


— ** 

X 


♦ 


— * — 




> — 




> 


— * — 






— * 


* 


— * 




— ♦ — 


— < 




— ♦ — 




> — 


— ♦ 


— r- 


— * 







— * 

— « 




— < 




— < — 




— > — 


— > — 




> — 


— > 


— > — 

X — 


— > — 

X 




X 


— > 

— * 




Rat 



Kangaroo 



-H- 



Figure 1 Sequence conservation of HOTAIR orthologues in mammals. (A) The sequences of HOTAIR orthologues are obviously conserved in 
primates but less well conserved in other animals (from UCSC Genome Browser). (B) Orthologues of the HOTAIR exons exist only in mammals. 
Exonl, exon3, exon4, exonS and domain B of exon6 are better conserved than exon2, exon6 and domain A of exon6 (indicated by the darkness 
of the boxes). Note that the sequence of the exon6 orthologue is significantly shorter in rat than in other mammals and contains just two 
domains. The two boxes under each exon6 are domain A (right side) and domain B (left side), linked by a double line indicating a gap of 130 
bp (unmatched part in the Infernal search). The gaps in exon6 of dolphin and dog also indicate unmatched parts in the Infernal search. The 
double slashes in the schematic of the dolphin gene indicate long introns. (C) The order and orientation of HOTAIR and its neighbouring HoxC 
genes in mammals. X: HoxC is absent. 



The downloaded genome data were released in 
Ensembl and UCSC at the same time, except that the 
platypus data were released in Ensembl in Dec 2005 and 
in UCSC in Mar 2007. To check if different assemblies 
affect genome search result, we downloaded the platy- 
pus genome data from UCSC and repeated the Infernal 
search. The obtained results were basically the same as 
those obtained from the platypus data in Ensembl. 

Fragments of HOTAIR exons are widely found In 
mammalian and non-mammalian vertebrate genomes 

Except for one high-scoring hit located between HoxCll 
and HoxC12, low-scoring hits of short queries (110 bp to 
120 bp for queryl to query4 and 64 bp for query5) were 
widely obtained in mammalian and other vertebrate gen- 
omes. These hits matched a fraction of a HOTAIR exon 
and it is unclear whether they contained any functional 
element. However, the 1,804 bp query6 produced few low- 
scoring hits in mammals and other vertebrates. Because 
the best hit was less conserved in non-primate mammals 
and much shorter in mouse and rat (Figure IB), we 
inferred that the functional domain(s) conserved in mam- 
mals should be much shorter than 1,804 bp. Further 
searches addressed this issue. The rat orthologue of exon6 
was only 622 bp, which was separated in the middle by an 
unmatched gap of 130 bp. In mouse, there was a similar 



gap of 150 bp at the same position. This gap, therefore, 
divided the highly conserved initial 622 bp of query6 into 
two domains (Figure IB). We extracted the two domains 
from the human, rhesus monkey and dog genomes and 
built query6a and query6b, respectively, as described 
above. As expected, searches of the 13 genomes with 
query6a and query6b produced more hits, but no new 
high-scoring hits were obtained. This result suggests that 
the two domains of exon6, which could be the backbone 
of HOTAIR, are not shared by other IncRNAs. While 
orthologues of domains A and B of exon6 were equally 
conserved in primates, orthologues of domain A were 
much less conserved in other mammals, especially in 
rodents (Figure IB). Thus, the two domains may undergo 
different evolutionary processes or dynamics. Query6a and 
query6b also produced some hits with moderate or low 
scores. Many of these hits matched to either two specific 
fragments in query6a (from approximately 50 bp to 
100 bp and from 130 bp to 180 bp) or a specific fragment 
in query6b (from approximately 160 bp to 210 bp). 
Whether these fragments are essential parts of the two 
domains and whether they are functional in vertebrates 
are unclear. Using lifeOver in UCSC, we checked whether 
hits show syntenic relationships among animals, and 
found that the coordinates of many hits, possibly in non- 
annotated regions, cannot be converted. 
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Hits of queries show distinct distributions in genomes 

Experimental studies have revealed that both HOTAIR 
and Xist bind to Ezh2 [44] and that HOTAIR contains 
at least two functional domains. The 5' domain binds 
Suzl2, a component of polycomb repressive complex 2 
(PRC2), whereas the 3' domain binds LSDl [17]. 
Because these proteins, especially the components of 
PRC2, are bound by many IncRNAs, we speculated that 
not all of the low-scoring hits were functionally irrele- 
vant. A popular method to roughly determine whether a 
DNA sequence is functional is to evaluate its conserved 
context [45-47]. We examined the hits of all of the 
queries in human, chimpanzee, mouse, rat and platypus. 
To evaluate the distribution of hits in the genomes and 
determine if they were flanked by genes with the same 
annotation, for each query and each genome, we 
counted the following: (a) the total number of hits, (b) 
the number of hits within introns, (c) the number of 
hits within novel transcripts or known ncRNAs, (d) the 
number of hits in exons of protein coding genes, (e) the 
number of hits antisense to a gene, (f) the number of 
hits in intergenic regions, and (g) the number of hits in 
the 3'UTR or 5'UTR of genes (Table 1). Essentially, no 
hits fell within exons of protein coding genes, and a 
majority of the hits were intergenic. Nevertheless, a few 
observations should be noted. First, no hits were found 
within Xist. Second, some hits fell within novel tran- 
scripts or known ncRNAs, highlighting the possibility 
that there could be functional elements in these tran- 
scripts or ncRNAs. Third, many hits fell within introns 
of protein coding genes. Finally, although query4 was 
the same length as queryl, query2 and queryS and 
query5 was even shorter, in all mammals, query4 and 
query5 produced significantly fewer hits. This high var- 
iance cannot be accounted for simply by random hits. 
One potential explanation is that exonl, exon2 and 
exonS may contain functional elements that are shared 
by other ncRNAs and/or distributed more widely. 

Flanking genes in different animals often reflect the 
evolutionary conservation of a DNA sequence. We speci- 
fically examined the flanking genes of each hit of queryl, 
query6a and query6b. Queryl had 32 hits flanked by the 
same genes in human and chimpanzee but just 1 hit 
flanked by the same gene in mouse and rat. Moreover, 
no hit of queryl was flanked by the same genes in all 
four mammals. Query6a had 14 hits flanked by the same 
genes in human and chimpanzee, but none in mouse and 
rat. Consistent with high conservation, query6b had 21 
hits flanked by the same genes in human and chimpanzee 
but none in mouse and rat. As mouse and rat have an 
evolutionary distance (divergence time) at least 4 times 
that of human and chimpanzee [37], these results indi- 
cate that hits of these queries have moderately conserved 
distributions in mammalian genomes. 



Table 1 Distribution of hits of queries in human, 
chimpanzee, mouse, rat and platypus 
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a: the total number of hits, b: the number of hits in introns, c: the number of 
hits in novel transcripts or l<nown ncRNAs, d: the number of hits in exons of 
protein coding genes, e: the number of hits antisense to a gene, f: the 
number of intergenic hits, g: the number of hits in the 3'UTR or 5'UTR of 
genes. 



Orthologous sequences of HOTAIR exons show different 
evolutionary dynamics 

Most protein coding genes are produced by gene duplica- 
tion followed by neofunctionalisation and/or subfunctio- 
nalisation. Because an increasing number of ncRNAs 
have been identified, the mechanisms through which 
these ncRNAs form and evolve are of great interest. 
HOTAIR comprises five short and one long exon. 
Although its origin remains obscure, some exons are 
apparently less conserved than others. We therefore ana- 
lyzed the molecular evolution of the HOTAIR exons. 
Using the concatenated sequences orthologues to exonl, 
exon3, exon4 and exon5 and the sequences orthologous 
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to exon6, we built two phylogenetic trees using Phylip 
[33] (Figure 2). Assuming that nucleotide substitutions 
followed the HKY85 model [34] and rates of nucleotide 
substitutions varied among sites, we analyzed sequences 
of HOTAIR exons using Paml. We first compared the 
two trees under multiple conditions. Under all of the 
conditions examined, nearly the same log-likelihood 
(InL) and other parameters were obtained for the two 
trees. For example, if nucleotide substitution rates were 
variable among sites and the molecular clock was allowed 
to vary from branch to branch, tree A produced a slightly 
larger log-likelihood (-759.53 vs. -774.33) for exonl but 
slightly smaller log-likelihoods for the other exons 
(-13949.07 vs. -13940.2 as the summed values). Because 
exon6 is the main body of HOTAIR, we then chose tree 
B to perform evolutionary analysis. We examined 
whether nucleotide substitution rates varied among sites 
in the exons using the log-likelihood ratio test, a statisti- 
cal test for comparing two models [35]. The smallest 
2AlnL = 2((-11774.82)-(-11791.17)) = 32.7 was obtained 
from orthologous sequences of exon6 (-11774.82 for the 
HKY85+gamma model and -11791.17 for the HKY85 
model). The probability distribution of the test can be 
approximated by a chi-square distribution with one 
degree of freedom, with Xi 0 5% = 7.88, supporting the 
model of disparate nucleotide substitution rates. Further 



analysis revealed that the sequences of the orthologues of 
different exons had different transition/transversion rate 
ratio (k), different shape parameter of the gamma distri- 
bution (a), and different nucleotide substitution rates 
between clades (Table 2). Because exonl, exon2 and 
exon6 had significant a>l, most sites in these exons 
should have moderate substitution rates, but a few sites 
had fast or slow rates of substitution. Because exons 3, 4 
and 5 all had a < 1, most sites in these exons should have 
low substitution rates. In addition, the values of a in 
exon6 (a >1 for domain A and a < 1 for domain B) indi- 
cated that domain B was more conserved than domain A, 
which agrees with the Infernal results in which the scores 
of the hits to query6a in non-primate mammals were 
lower than the scores of hits to query6b (Figure IB and 
Table SI in Additional file 1). These results indicate 
asynchronous evolution of orthologous sequences of 
HOTAIR exons in mammals. 

To examine HOTAIR evolution in more detail, we also 
investigated whether nucleotide substitution rates varied 
among clades. First, we performed a log-likelihood ratio 
test to determine whether the HKY85 model would fit 
the data better with or without a global clock. The smal- 
lest 2AlnL = 2((-11774.82)-(-11814.58)) = 79.52, with = 
21.96 (this log-likelihood ratio test has eight degrees of 
freedom), was obtained from orthologous sequences of 
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Figure 2 Phylogeny of HOTAIR. (A) A tree built witli concatenated sequences of ortlnologues of exonl, exonB, exon4 and exonS. (B) A tree 
built with sequences of exon6 orthologues. CI indicates the first local clock, while C2a, C2b and C2c indicate the second local clock inserted at 
three different places in different computations. 
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Table 2 Some estimated parameters of molecular evolution 





Exoni 


Exon2 


ExonS 


Exon4 


ExonS 


Exon6 


DomainA 


DomainB 




1.81901 


2.88324 


4.85122 


1.92371 


5.90265 


2.18681 


1.50398 


1 45864 


a 


233.20* 


137.621 


0.66662 


0.66759 


0.70397 


3.14532 


2.04029 


0.70016 




9.34444 


1 .88055 


0.46272 


2.23894 


1 .05430 


1.05613 


0.43880 


2.78380 




7.19374 


0.83409 


0.46463 


0.99973 


1 .78445 


1.15327 


0.45130 


0.92838 




6.65571 
0.39720 


0.89836 
0.42971 


1.00285 
0.71267 


1.01670 
0.45403 


0.45319 
0.59854 


0.47190 
047959 


0.55024 
0.74729 


2.92595 
1 .02247 


# 


15.75779 
0.45376 


2.09060 
0.25278 


1 .40709 
0.03274 


2.23928 
0.11204 


0.77385 
021409 


0.98396 
047428 


0.73568 
0.74774 


2.85161 
0.39388 



To obtain stable local clock estimations, only two local clocks were specified in each run (see Figure 2B) and the remaining species had rate ro = 1. *: a large 
unstable value. #: r2c was not stable because the clade (mouse, rat) was too close to the root. 



exon6 (-11774.82 for the HKY85+gamma model without 
a global clock and -11814.58 for the HKY85+gamma 
model with a global clock). This result clearly disproved 
the global clock hypothesis. Then, we set two local clocks 
to investigate whether the exons evolved at different rates 
in mammals (Figure 2B). For exonl, exon2, exon4 and 
domain B of exon6, the substitution rates in primates 
were significantly higher than those in the other group of 
mammals; for exonS, exon5 and domain A of exon6, the 
substitution rates were not much different between the 
two groups (Table 2). As the 5' domain of HOTAIR has 
been found to bind to Suzl2 and the 3' domain to LSDl 
[17], whether the accelerated evolution of exonl, exon2 
and domain B of exon6 in primates has relationship with 
their protein binding function awaits further investiga- 
tion. In addition, the frequencies of nucleotide substitu- 
tions (ttA, nT, nC, nG) varied significantly for different 
branches and at different nodes. Taken together, these 
results suggest that HOTAIR may be a relatively new 
gene, with some exons having recently undergone an 
accelerated evolution. 

HOTAIR evolves faster than its neighbouring HoxC genes 
in mammals 

HoxC genes exist widely in vertebrates; HOTAIR, in con- 
trast, exists only in mammals. It was therefore interesting 
to determine whether HOTAIR evolved faster than the 
neighbouring HoxC genes. Because HoxCll is absent in 
rat, dolphin and platypus and the long exon of human 
HoxC12 is absent in some mammals, we compared the 
evolution of the short exon of HoxC12 with the evolution 
of the main part of exon6 of HOTAIR in 22 mammals (see 
Data and Methods). Sequences from the UCSC database 
that were aligned by Multiz and EvoNC, a program for 
detecting selection in noncoding regions of nucleotide 
sequences, were used [25]. For protein coding sequences, 
the rate of nonsynonymous/synonymous substitution was 
used to detect selection pressure and positive/negative 
selection. To apply such detection to noncoding sequences, 
the rate of substitution relative to the rate of synonymous 



substitution in coding sequences can be modelled by the 
parameter 5. 5 = 1 indicates that a site in a noncoding 
sequence evolved neutrally, whereas 5 < 1 and 5 > 1 sug- 
gest positive and negative selection, respectively [25]. We 
concatenated the aUgned orthologous region of HOTAIR 
and the aligned short exon of HoxC 12 and analyzed the 
concatenated sequences. The results are shown in Table 3. 
The log-likelihood test clearly rejected the null hypothesis 
that the HOTAIR region evolved neutrally, and the value 
of 4.1694 found for 82 in the three-category case strongly 
suggested that the HOTAIR region was under positive 
selection and evolved faster than HoxC12. The exact driv- 
ing force behind this positive selection remains to be 
elucidated. 

Structure prediction reveals two domains with invariable 
sequences and structures 

As many IncRNAs, including Xist and HOTAIR, can 
interact with both polycomb proteins and DNA sequences, 
it is important to identify the sequences and structures of 
their functional domains [48]. An IncRNA may have a 
conserved backbone and/or functional domains but have 
varied structure in different species, making the determi- 
nation of the accurate structure of the full IncRNA diffi- 
cult and sometimes unnecessary. So, instead of attempting 
to predict the structure of the full HOTAIR, we focused 
on determining the sequence and structure of possible 
functional domains in its exons. Because the orthologous 
sequences of each HOTAIR exon were obtained using 
structure-based genome searches, they had the same 
structures as the queries built by PMmulti and Infernal. 
Because each query produced only one high-scoring hit 
located between HoxCll and HoxC12, the structures of 
queries determined by PMmulti and Infernal should be 
reasonable. To facilitate the determination of the sequence 
and structure of possible functional domains, two con- 
straints were used. First, in the consensus structure of 
each query determined by PMmulti and used by Infernal, 
functional domains should be occupied by sequences con- 
served in the 10 mammals. Second, in all of the possible 
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Table 3 Log-likelihood values and parameter estimates given by EvoNC 





Likelihood 


K 


to 


So 


Po 


5, 


Pi 


82 P2 


Neutral 


-3665.64 


3.74 


0.60 


0.0834 


0.6070 


1.00 


0.3930 




Two category 


-3650.17 


3.37 


0.60 


0.2964 


0.6439 


3.3412 


0.3561 




Three category 


-364542 


3.48 


0.60 


0.0362 


0.3571 


1.00 


04358 


41694 0.2071 



structures of an exon's orthologue in a mammal predicted 
by other tools, functional domains should have invariant 
sequences and structures. 

Because a 5' domain of HOTAIR binds PRC2 [17], we 
assumed the 5' domain should be conserved in mam- 
mals and query2 did not produce any high-scoring hits 
in dog, mouse or rat, we tried to identify the functional 
domain in predicted structures of exonl. Pseudo Viewer 
shows that the consensus structure for exonl consists of 
one big arc and three substructures (Figure 3A and 
Figure SI in Additional file 1). The bottom substructure 
contains three small loops in some mammals, but is a 
large loop in cow, dolphin, mouse and rat; therefore it is 
unlikely to be a functional domain. The middle sub- 
structure contains three tiny loops and the top substruc- 
ture contains a hairpin at its end in all animals, which 
indicates that they could be or contain the functional 
domain. To obtain more results to aid in the determina- 
tion, Mfold was used to predict structures of exonl's 
orthologous sequence in each mammal [27]. Mfold pre- 
dicted 9 structures in human, 9 in chimpanzee, 14 in 
rhesus monkey, 3 in gorilla, 2 in cow, 3 in dog, 2 in dol- 
phin, 42 in horse, 1 in mouse and 3 in rat (Figure S2 in 
Additional file 1). Notably, the hairpin was found at the 
same position in 7 of the 9 predicted structures in 
human, and in the other 2 cases its sequence was 
embedded within neighbouring sequences (Figure 3BCD 
and Figure S2 in Additional file 1). Similar results were 
obtained from other animals. These Mfold predicted 
structures provide valuable and complementary informa- 
tion for determining the possible position and structure 
of the functional domain in exonl. 

According to an experimental study, a 3' domain of 
HOTAIR, located from approximately 1500 bp to 2146 
bp, binds LSDl [17]. However, Infernal produced short 
sequences for the exon6 orthologues in mouse and rat 
(1,500 bp and 622 bp, respectively), which did not include 
the 3' end reported in the human sequence. Postulating 
that the 3' functional domain should be conserved in 
mammals and might not be as long as 622 bp, we ana- 
lyzed the structures of domain A (560 bp to 800 bp) and 
domain B (950 bp to 1,190 bp) of exon6 in the 10 mam- 
mals. As stated previously, the structure determined by 
PMmulti and used by Infernal was compared with all of 
the structures predicted by Mfold. We first examined 
domain B. Mfold predicted 3 structures for domain B in 
human, 6 in chimpanzee, 6 in rhesus monkey, 8 in 




Figure 3 Predicted structures of exonl orthologues in 
mammals. (A) The structure predicted by PMmulti and used by 
Infernal. This consensus structure consists of one big arc and three 
substructures. In some mammals, the bottom substructure contains 
three small loops, but in cow, dolphin, mouse and rat, it is a big 
loop. The middle substructure contains three tiny loops and the top 
substructure contains a hairpin at its end in all animals. (B) Two 
structures predicted by Mfold in human. Although the overall 
structures are different, the hairpin structure found in the PMmulti- 
predicted structure invariably occurs in both structures. (C) Two 
structures predicted by Mfold: one in cow and one in dog. The 
sequence and its hairpin structure (slightly varied) occur in both 
structures. 

^ 
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gorilla, 2 in cow, 13 in dog, 5 in dolphin, 7 in horse, 9 in 
mouse and 7 in rat. We first checked those in human and 
rat, the two mammals with the greatest evolutionary dis- 
tance, and found that a conserved GC-rich paired frag- 
ment existed in structures predicted for all 10 mammals 
and that it closely matched the marked part (the circled 
area) in the structure predicted by PMmulti (Figure 4A 
and Figure S3B in Additional file 1). We then examined 
all 66 structures of domain B predicted by Mfold for the 
10 mammals and found that in 4 cases, the GC-rich 
paired fragment had the structure shown in Figure 4A 
(human), in 39 cases it had the structure shown in Figure 
4B (chimpanzee), in 11 cases it had the structure shown 
in Figure 4C (cow), and in 7 cases it had the structure 
shown in Figure 4D (dog). Compared with the predicted 
functional domain in exonl, this specific structure 
existed more obviously in domain B. In contrast, domain 
A of exon6 was much less conserved (Figure IB and 
Table SI in Additional file 1) and was GC poor (data not 
shown), without a clear consensus substructure in struc- 
tures predicted using Mfold. 

The sequence and structure of the two domains occur 
nearly invariably in structures of full HOTAIR 

Although focusing on the sequence and structure of con- 
served (and potentially functional) domains is reasonable, 
the structure of a piece of RNA can be very different 
from that when it is embedded by long sequences. To 
validate the sequence and structure of the two conserved 
fragments in exonl and domain B of exon6, we predicted 
structures of the full HOTAIR in all the mammals. The 
predicted sequence and structure of the fragment in 
exonl occurs in many structures of full HOTAIR; 
remarkably, the predicted sequence and structure of the 
fragment in domain B of exon6 occurs in most structures 
of full HOTAIR. For example, Mfold produced 29 and 37 
structures for human and rat full HOTAIR respectively. 
In humans, the predicted structure of the fragment in 
exonl occurs in 8 of 29 full HOTAIR structures, and the 
predicted structure of the fragment in domain B of exon6 
occurs in 20 of 29 full HOTAIR structures. In rats, the 
predicted structure of the fragment in exonl occurs in 33 
of 37 full HOTAIR structures, and the predicted struc- 
ture of the fragment in domain B of exon6 occurs in 31 
of 37 full HOTAIR structures (Additional file 3). Given 
the length of HOTAIR and the number of its predicted 
structures, these results strongly support the predicted 
functional fragment in domain B of exon6. The next step 
should be to experimentally validate these structures and 
their functions. 

Discussion 

Except for Xist, the origin, evolution, structure and phy- 
logenetic distribution of IncRNAs have barely been 



investigated. Because BLAT failed to find orthologous 
sequences of HOTAIR exons in mammals, some exons 
are missing in some mammals and gaps exist in many 
mammals in the sequences of exon orthologues identified 
using the RNA homology search software Infernal, 
HOTAIR is likely to have conserved structures but diver- 
gent sequences. This feature should be common to 
IncRNAs rather than being unique to HOTAIR [N1,N2]. 
For example, XIST contains both rapid evolving 
sequences and highly conserved domains [49] . What con- 
strains IncRNAs evolution is poorly understood. As they 
interact with both the PRC2 complex and specific DNA 
sequences, co-evolution with specific DNA sequences 
should be an important aspect. Compared with IncRNAs 
functioning in cis to regulate local genes, the evolutionary 
constraints of HOTAIR that function in tran is more 
intriguing. Because the Infernal search produced just one 
high-scoring hit in each placental mammal, where this 
was located between HoxCll and HoxC12, it can be 
inferred that HOTAIR exists in eutherians and that it is 
younger than its neighbouring Hox genes. How HOTAIR 
originated remains unclear. Phylogenetic analysis 
revealed that within the relatively young gene, HOTAIR 
exons had asynchronous evolutionary dynamics and 
some exons had undergone an accelerated evolutionary 
process in primates. These results indicate positive selec- 
tion during HOTAIR's evolution. Accelerated evolution 
is also supported by the comparison between HOTAIR 
exon6 and the short exon of HoxC12, which clearly 
showed that the HOTAIR exon evolved significantly fas- 
ter than the HoxC12 exon. Structure prediction for the 
orthologous sequences in 10 mammals showed two frag- 
ments in exonl and domain B of exon6 with invariant 
base pairing and 2D structure (Figure 3 and Figure 4). 
These fragments, located at the 5' end and close to the 3' 
end of HOTAIR, respectively, could be functional 
domains of HOTAIR. 

One query based on a HOTAIR exon produced only 
one high-scoring hit in the genome of each mammal, 
where this was located between HoxCll and HoxC12, 
the location of HOTAIR in the human genome. How- 
ever, many low-scoring hits were found in other places in 
mammalian and other vertebrate genomes. Because a 
considerable number of IncRNAs are believed to interact 
with polycomb proteins to conduct tissue-specific gen- 
ome modification, we anticipated, for several reasons, 
that the Infernal search would identify some consensus 
sequences for polycomb protein binding in genomes, like 
the TATA box in promoters and the homeobox in Hox 
genes, that are shared by other IncRNAs. First, the four 
families of Hox genes have demonstrated complex cross 
regulation and compensation during embryogenesis 
[50,51], which suggests that multiple HOTAIR-like 
IncRNAs may be needed to mediate negative feedback 
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Figure 4 Predicted structures of orthologues of domain B of exon6 in mammals. (A) The structure predicted by PIVlmuIti and used by 

nfernal. The circled part was identified by comparing the structure with the structures predicted by Mfold based on the position and base 

pairing of sequence. (B) One structure predicted by IVlfold in chimpanzee; the circled part is nearly the same as that predicted for the human 

sequence. (C) One structure predicted by Mfold in mouse; the circled part is slightly different but still occurs at one end. (D) One structure 

predicted by Mfold in dog; the circled part is embedded within other sequences. 
^ J 
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and dosage balances among Hox genes. Second, Hox 
genes participate in diverse cell fate determination and 
reprogramming [52,53], which suggests that Hox-related 
IncRNAs may mediate genome modification at multiple 
loci. Third, a recent study revealed that both HOTAIR 
and the RepA ncRNA within Xist bind to the PRC2 com- 
plex [44], although it is unclear whether the binding 
domains are similar. Moreover, because multiple impor- 
tant proteins, such as Nanog, Oct4 and Sox2, also inter- 
act with Xist [54], the scope of IncRNA functions should 
be large. These facts make it theoretically plausible that 
there should be many IncRNAs that share the same or 
similar functional domains with HOTAIR. However, 
except for one high-scoring hit, no hits with moderate 
scores were obtained. To what extent IncRNAs maintain 
conserved function with evolved sequences is poorly 
understood. It is unlikely that all of the low-scoring hits 
are random hits, because query4 and query5, which are 
equal to and shorter than queryl, produced much fewer 
hits. In addition, some low-scoring hits fell within novel 
transcripts and unknown ncRNAs, and many were within 
introns of or antisense to protein coding genes (Table 1), 
which is consistent with the findings that many IncRNAs 
(like AIR and Kcnqlotl) are antisense to protein coding 
genes [11,55]. Global transcriptome analysis has revealed 
that a large proportion of the genome can produce tran- 
scripts firom both strands [2] and antisense transcription 
is believed to have roles in gene regulation. Meanwhile, 
more than 55,000 completely intronic noncoding RNAs 
have been found to be transcribed from the introns of 
74% of all unique RefSeq genes, which indicates that 
RNAs transcribed from intronic regions of genes have 
distinct regulatory roles and are involved in a number of 
processes [56,57]. To carefully compare all hits with 
cDNA libraries should produce more information. 

The evolution of IncRNA sequences, including those 
within the vertebrate Hox clusters, has been examined 
recently. These studies reveal that the evolution of many 
IncRNAs is not consistent with the neutral evolution 
model, and purifying selection has acted on their pro- 
moters and some conserved sites [58,59]. However, 
except for Xist [21,22], the evolution of specific ncRNA 
genes has not been examined. Compared with ancestral 
regions and general intergenic sequences, IncRNA 
sequences have been shown to exhibit lower rates of 
nucleotide substitution, insertion, and deletion, which 
can be interpreted to indicate that they have undergone 
purifying selection [58]. Our analysis of orthologues of 
HOTAIR in 10 mammals covering multiple eutherian 
orders suggests that HOTAIR exons have discrete evolu- 
tionary dynamics, and that some exons evolved signifi- 
cantly faster in primates than in non-primate mammals. 
The analysis of orthologous sequences of HOTAIR 



exon6 and a HoxC12 exon in 22 mammals indicates 
that HOTAIR may have evolved faster than its neigh- 
bouring HoxC genes. These results suggest that 
HOTAIR may have undergone an accelerated evolution 
in eutherians under positive selection. In general, a gene 
with important function should evolve slowly due to 
strong functional constraints; however the opposite is 
often true when the gene is young (in an active neo- 
functionalisation or subfunctionalisation stage). For 
example, young proteins experience more variable selec- 
tion pressures than established proteins [60]. That 
HOTAIR is not found in non-mammalian vertebrates 
and that it has evolved faster than nearby HoxC genes 
both indicate that it is a young gene that formed after 
the two rounds of whole genome duplication. Given 
that most IncRNAs, including Xist, have so far only 
been found in mammals, it is interesting to ask when 
and why these IncRNAs emerge in higher vertebrates to 
mediate genome modification. Because HOTAIR exists 
in mammals, it evolves faster than HoxC12, and its 
exons have discrete evolutionary dynamics, we postulate 
that HOTAIR may have formed ab initio, possibly via 
the activity of transposons. HOTAIR is involved in the 
PRC2-mediated silencing of chromatin at HoxD loci, 
but its main effect in the regulation of Hox gene expres- 
sion may be dosage compensation. In this sense it is 
similar to Xist. The lower effectiveness of dosage com- 
pensation in birds than in mammals and the lack of 
general dosage compensation for sex-linked genes in 
chickens [61,62] may explain why HOTAIR, like Xist, is 
found only in eutherians. 

In this study, the structures of HOTAIR exons were 
predicted using two programs. Without any experimen- 
tal data for structure prediction [48], we adopted a com- 
parative computational approach to predict the 
sequence and structure of conserved functional domains 
of HOTAIR rather than the structure of the full 
HOTAIR sequence in mammals. PMmulti and Mfold 
were used to predict multiple potential structures for 
orthologues of each exon. For example, for exonl, 
Mfold predicted 9 structures in human, 9 in chimpan- 
zee, 14 in rhesus monkey, 3 in gorilla, 2 in cow, 3 in 
dog, 2 in dolphin, 42 in horse, 1 in mouse and 3 in rat. 
If invariant sequence base pairing and 2D structure are 
found in all of the structures predicted using Mfold and 
in the consensus structure predicted using PMmulti, it 
is highly likely that the sequence and structure repre- 
sents a functional domain. Because to produce experi- 
mental data to determine structures of IncRNAs is 
sophisticated and time-consuming, the results of our 
structure prediction should be valuable for further 
experimental studies of HOTAIR, and the methods 
should be applicable to studies of other IncRNAs. 
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Conclusions 

The IncRNA HOTAIR has poorly conserved sequences 
and considerably conserved structures in 11 examined 
mammals (10 eutherians and 1 marsupial). It shows dis- 
tinct evolutionary features and has evolved faster than 
nearby HoxC genes. Given that exons 1-5 are very short, 
exonl and a domain of exon6 (1804 bp) are absent in 
kangaroo, and exon2 is absent in mouse, rat and kan- 
garoo, a highly conserved 239 bp domain in exon6, initi- 
ally appeared in kangaroo, should be the backbone of 
HOTAIR. These findings suggest the ab initio generation 
of HOTAIR in marsupials. Structure prediction identifies 
two fragments, in the 5' end exonl and the 3' end domain 
B of exon6 respectively, with their sequence and struc- 
ture invariably occurring in various predicted structures 
of exonl, the domain B of exon6 and the full HOTAIR. 
These are supported by experimental findings. To com- 
pare the origin and evolutionary features of HOTAIR 
with Xist suggests that many IncRNAs, may first form in 
marsupials and then have undergone a rapid evolution in 
eutherians. An interesting question is whether their 
origin and evolution is intrinsically associated. 

Note added in proof 

During the review of the manuscript, we downloaded, 
searched and analyzed opossum and kangaroo genome 
data (Ensembl released the improved assemblies of kan- 
garoo (dipOrdl.60) and opossum (BROAD05.57) in Oct 
2010). Whole genome search of opossum did not produce 
high-scoring hits. Searching the kangaroo genome with 
query3, query4, query5 and query6 each produced a high- 
scoring hit, which have successive addresses in GeneScaf- 
fold_2370 (Figure IB and Additional file 2). Query6's hit 
matches query6 from 33 bp to 655 bp, exactly as that in 
rat. What is interesting is that exonl and exon2 were 
absent, domain B of exon6 produced a high-scoring hit, 
and domain A of exon6 was not identified. These results, 
together with the phylogenetic analysis, lead to two sug- 
gestions. First, HOTAIR, like other IncRNAs, first formed 
in some marsupials and underwent a rapid evolution in 
eutherians. Second, domain B of exon6 may be the back- 
bone of HOTAIR, because it is the only relatively long 
piece conserved in marsupial and eutherians. 

Monotremes have multiple X chromosomes but it is 
not clear whether they undergo dosage compensation; 
marsupials show dosage compensation but they lack Xist 
[63]. It is found that female marsupials may use an ances- 
tral dosage compensation mechanism that differs from, 
but share common properties with, the Xist based in 
eutherians [64,65]. Since protein-coding genes that flank 
the eutherian XIC are well-conserved in M. domestica 
and vertebrates and there is a surprising break in synteny 
with eutherian mammals and other vertebrates, it is 



suggested that during the evolution of the marsupial X 
chromosome, one or more rearrangements broke up an 
otherwise evolutionarily conserved block of vertebrate 
genes [66]. The situation of HOTAIR, which is flanked 
by HoxCll and HoxC12, not found in vertebrates and 
initially occurs in marsupials as revealed in this study, 
seems quite similar to Xist. This raises the interesting 
question of whether HOTAIR and Xist, and possibly also 
some other IncRNAs, have undergone the same evolu- 
tionary process. 

Additional material 



Additional file 1: This file contains Table SI, Figure SI, Figure S2, 
and Figure S3 

Additional file 2: This file contains orthologues of HOTAIR exons 
and their coordinates in mammals 

Additional file 3: This file contains predicted structures of full 
HOTAIR in human and rat 
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